Skip to content

feat: add timeout param to client Dataset.encode and document num_proc#232

Merged
tpx818 merged 2 commits into
modelscope:mainfrom
kevssim:encode_proc
Jun 23, 2026
Merged

feat: add timeout param to client Dataset.encode and document num_proc#232
tpx818 merged 2 commits into
modelscope:mainfrom
kevssim:encode_proc

Conversation

@kevssim

@kevssim kevssim commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

twinkle_client.dataset.Dataset.encode 新增 timeout 参数(默认 600 秒),透传至底层 http_post,解决大数据集 / 多进程 tokenize 时 HTTP 请求超时的问题。

  • 修改 client_tools/client_generator.pybuild_method,仅对 encode 方法注入 timeout: int = 600 并转发给 http_post,其余生成方法保持不变。
  • 重新运行生成器,更新 src/twinkle_client/dataset/*.py 等自动生成产物。
  • 新增 tests/twinkle_client/test_client_timeout.py,以 TDD 方式验证 encode 的签名与 http_post 调用均携带 timeout
  • 在中英文 Dataset 文档补充 num_proc 加速 encode 的提示;在 Twinkle 客户端文档补充 timeout 用法示例。

用法:

from twinkle_client.dataset import Dataset

dataset.encode(batched=True, num_proc=8, timeout=3600)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a default timeout of 600 seconds for the encode method in the client generator, updating both Dataset and LazyDataset classes, and documents this change along with multi-process parallelism (num_proc) in the documentation. It also adds save_as and flush_save methods to the base dataset client. The reviewer feedback suggests improving the robustness of the timeout injection by allowing Optional[int] to disable timeouts, avoiding duplicate injection if the parameter is already present, and restricting the injection specifically to the Dataset and LazyDataset classes to prevent unexpected behavior in other classes with an encode method.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread client_tools/client_generator.py Outdated
Comment thread client_tools/client_generator.py Outdated
@tpx818 tpx818 merged commit 866515d into modelscope:main Jun 23, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants